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PHASE TRANSITIONS FOR THE UNIFORM DISTRIBUTION 
IN THE PML PROBLEM AND ITS BETHE APPROXIMATION* 
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Abstract. The pattern maximum likelihood (PML) estimate, introduced by Orlitsky et al., is an 
estimate of the multiset of probabilities in an unknown probability distribution p, the estimate be¬ 
ing obtained from n i.i.d. samples drawn from p. The PML estimate involves solving a difficult 
optimization problem over the set of all probability mass functions (pmfs) of finite support. In this 
paper, we describe an interesting phase transition phenomenon in the PML estimate: at a certain 
sharp threshold, the uniform distribution goes from being a local maximum to being a local mini¬ 
mum for the optimization problem in the estimate. We go on to consider the question of whether a 
similar phase transition phenomenon also exists in the Bethe approximation of the PML estimate, the 
latter being an approximation method with origins in statistical physics. We show that the answer to 
this question is a qualified “Yes”. Our analysis involves the computation of the mean and variance 
of the (i, j)th entry, aij, in a random k x k non-negative integer matrix A with row and column 
sums all equal to M, drawn according to a distribution that assigns to A a probability proportional to 

T-r (M—ajj )! 

I aij\ 


1. Introduction 

Consider the estimation problem in which, given a sequence of n i.i.d. samples from a fixed but 
unknown underlying probability distribution p, we are required to estimate the multiset of proba¬ 
bilities in p. In particular, we need not determine the correspondence between the symbols of the 
underlying alphabet and the probabilities in the multiset. Such a problem arises naturally in the 
context of universal compression of large-alphabet sources |[I], and has several other applications, 
for example, population estimation from a small number of samples |J2|. The multiset of empirical 
frequencies of the symbols observed in the n samples is a straightforward estimate of the multiset of 
probabilities in p; this estimate corresponds to the maximum likelihood (ML) estimate of p. How¬ 
ever, when the sample size, n, is smaller than the size of the support of the underlying distribution 
p, the ML estimate may not give a good estimate of the multiset of probabilities in p. An alternative 
estimate that has been proposed for this regime is the pattern maximum likelihood (PML) estimate, 
introduced by Orlitsky et al. lUl, |2j and described below. 

The pattern i/> or 'j/)(x”) of a sequence x” = xi,..., is a data structure that keeps track of the 
order of occurrence and the multiplicities of the distinct symbols in the sequence x”; for a precise 
definition, see Section |2] The pattern maximum likelihood (PML) distribution of a pattern ■0 is 
the multiset of probabilities that maximizes the probability of observing a sequence with pattern 
i/>. It has been argued El, |21, 0 that for a sequence x” sampled from an unknown underlying 
probability distribution p, the PML distribution of '0(x’^) is a good estimate of the multiset of 
probabilities in p, even in situations where n is much smaller than the support size of p. However, 
for the purposes of this paper, we view the PML distribution purely as an interesting mathematical 
object. 

The problem of determining the PML distribution (henceforth termed the “PML problem”) of a 
given pattern xjj appears to be computationally hard 0-[9l. In part, this is because the underlying 


*This work was presented in part at the 2013 IEEE Information Theory Workshop held in Seville, Spain, Sept. 9-13, 
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optimization problem is not convex. It turns out that the PML problem can be very well approxi¬ 
mated by its Bethe approximation ifTOl . ifTTl . which in this case is a convex optimization problem. 
The Bethe approximation is a technique with roots in statistical physics. The optimization prob¬ 
lem in the Bethe approximation can usually be solved highly efficiently using belief propagation 
algorithms lll^ . 

In this paper, we are concerned with a remarkable phase transition phenomenoi{3 observed in the 
PML problem. For a positive integer k, let Uk denote the uniform distribution on k symbols. Given 
a pattern -i/i, we can explicitly compute a quantity T{'ip) such that for all k < T('0), Uk is a local 
maximum, among all distributions p with support size k, for the optimization problem within the 
PML problem; and for all k > Uk is a local minimum. On the basis of this observation, we 

proposed in ifTSll a heuristic algorithm for determining whether or not the uniform distribution is the 
PML distribution of a given pattern -0. 

Given that the Bethe approximation is a very good proxy for the PML distribution, it is natural to 
ask whether the phase transition phenomenon described above extends to the Bethe approximation 
as well. We are able to give a qualified affirmafive answer fo fhis quesfion. Our answer is given in 
ferms of a sequence of “degree-M opfimizafion problems” such fhaf fhe degree-1 problem is fhe 
original PML problem, and as M —>• oo, we obfain fhe Befhe approximafion. We show fhaf for all 
sufficienlly large M, fhe degree-M opfimizafion problems admif a phase fransifion phenomenon 
very similar fo fhaf described above for fhe PML problem. While fhis falls jusf shorf of proving fhaf 
fhe Befhe approximafion ifself admifs such a phase fransifion, if lends sfrong supporf in favour of 
fhis assertion. 

The bulk of our proof of fhe exisfence of phase fransifions in fhe degree-M opfimizafion problems 
involves analyzing a cerfain probabilify disfribufion, denoted by Qk,M, on fhe sef of k x k non- 
negafive integer mafrices wifh all row and column sums equal fo M. This probabilify disfribufion 
and our analysis of if via a discrefe Gaussian approximafion may be of independenf inferesf. 

The remainder of fhis paper is organized as follows. In Section |2l we provide fhe definifions 
needed fo describe fhe PML problem, affer which we slate and prove fhe corresponding phase 
fransifion phenomenon (Theorem [U. The Befhe approximafion is described in Secfion [3] This 
section also explains fhe notion of “degree-M lifted permanenfs” defined by Vonfobel Ifl4]l . which 
is used fo define our degree-M opfimizafion problems. Secfion |4] conlains a precise sfalemenl 
(Theorem O of fhe phase fransifion phenomenon in fhe degree-M problems, fhe proof of which 
occupies much of fhe resl of fhe paper. In particular, Secfion [5] collecfs fogelher fhe properlies of 
fhe probabilify disfribufion Qk,M fhaf are used in fhe proof. The paper concludes in Secfion^wilh a 
discussion of fhe gap remaining in a rigorous proof of fhe phase transition phenomenon in the Bethe 
approximation. Some of the more technical proofs from Sections [3]-0 are presented in appendices. 

2. The PML Problem and a Phase Transition Phenomenon 

We use Z_|_ and Z_|_+, respectively, to denote the set of non-negative and positive integers. For 
k G Z_|__|_, we use [k] to denote the set {1,2,..., k}. For any countable set X, we let Ux denote the 
set of all probability distributions on X: 

= |p = ip{x))^^x ■ p{x) > 0 Vx G <T, ^ p{x) = 1 

For any k G ^++, we let Uk denote the uniform distribution on [k], 

2.1. Patterns and PML. Given a sequence x" = xi,... ,Xn over some alphabet, the pattern of 
x” is the sequence 0 = 0i, 02 ,..., 0n obtained by replacing each Xj by the order of its first 
occurrence in x"^ [4], ifTOll . More precisely, for each symbol x occurring in -xU, let v{x) denote 

0ur use of the term “phase transition” here is inspired hy statistical physics, where the term is often used to describe 
abrupt changes in behaviour of physical (especially, thermodynamical) systems. 
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the number of distinct symbols seen in the shortest prefix of x” that ends in the symbol x. Then, 
ijjj = ^{xj) for y = 1, 2,... , n. The pattern ■i/>(x”) is defined to have length n and size m, where m 
is the number of distinct symbols in x"^. For example, the word “sleepless” has pattern 123342311, 
which is of length 9 and size 4. We will canonically represent a pattern rp as 
where gij is the multiplicity of the symbol j, i.e., the number of times j appears, in r/>. Note that 
Pi + ■ ■ ■ + Pm = n. The pattern tp in our example has canonical form 1^2^3^4. 

Let r/i be a given pattern of length n, and let p = be a probability distribution over 

a discrete (possibly countably infinite) set X. The probability that n i.i.d. samples drawn from the 
distribution p forms a sequence with pattern rp is given by 

n 

)='*/? i=l 

Clearly, all patterns tp with the same canonical form rnf^rn jpg same pattern 

probability P{tp-,p). Indeed, if p = {pi, ■ ■ ■ ,Pk) C Hj;.] with k > m, then we can write 

m 

P{'^p■,p) = Y,IlPa(^)^ (2) 

O' i=l 

where the summation runs over all one-to-one maps a : [m] [k ]. 

The right-hand side of ^ can be expressed in alternative form using the notion of a permanent 
of a matrix. The permanent of a real k x k matrix 0 = (0ij ) is defined as 

k 

perm(0) = 

TT i = l 

where the summation is over all permutations vr : [/c] —)• [k]. With this, it can be verified that ^ 
can be re-written as 

P{'^-,V)= m ^ .. Peim{e{'ip;p)), (3) 

(k — m)\ 

where Q{'ip-, p) is the k x k matrix (Oij) with Oij = p ^^; here, we sejl /iy = 0 for m -|- 1 < y < k. 
The term in ([31) comes from the fact that each one-to-one map a : [m] —[A;] in the sum (O 

can be extended to a permutation tt : [k] ^ [k] in exactly {k — m) \ different ways. 

The PML probability of a pattern tp of size m is defined as 

P™^{'iP) := maxP(r/i; p) (4) 

p 

the maximum being taken over all discrete distributions p of support size at least m. Any distribu¬ 
tion that attains the maximum above is called a PML distribution of xp, denoted by p™^('0). For 
the purposes of this paper, we will assume that the maximum is indeed attained by some discrete 
distribution p0 In this case, there is always a PML distribution with finite support Q. Hence, we 
have 

= max max P(xp:p) (5) 

fc>mpen[fe] 

It should be pointed out that, for any k > m, since P{tp-, p) is a continuous function of p G flj^j, 
as is evident from (O, it does attain its maximum on the compact set flj^j. 

The problem of determining the PML distribution of a pattern seems to be computationally diffi¬ 
cult in general 0, [4j-||9l. Algorithms for approximating the PML distribution have been proposed 
by Orlitsky et al. [4j and Vontobel ifTOl . Vontobel’s algorithm, in particular, uses the Bethe approx¬ 
imation, about which we will have much more to say in Section [3] 

^For consistency, we define 0° = 1. This is also in keeping with the convention used in Definition[T]in Section[3]that 
0 log 0 = 0. 

^In general, to guarantee that the maximum is always attained, we must allow “mixed” distributions; see 
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2.2. Phase Transition in the PML Problem. Consider now the potentially simpler decision prob¬ 
lem of determining whether or not the PML distribution of a given pattern is a uniform distribution. 
A natural approach to this problem would be to find a test for whether, for any fixed k > m, fhe 
uniform disfribufion achieves fhe inner maximum in ([51). In affempfing fhis approach, we discovered 
a sfriking phase fransifion phenomenon in fhe PML problem. To describe fhis, we infroduce some 
nofafion. For a pattern r/i of size m, and an infeger k > m, lef : flj^] [0,1] be fhe function 
defined by fhe mapping p i-A- P(r/>;p). The phase fransifion phenomenon is made precise in fhe 
following fheorem. 


Theorem 1. For a pattern r/i of length n with canonical form m >2, define 

TW = 


n? — n 


E m 2 

i=i Fi - ^ 


( 6 ) 


Then, for all integers k > m, the following holds: when k < T('0), the uniform distribution is 
a local maximum of the function 13'^, and when k > T(r/>), Uk is a local minimum. 


We clarify a poinf concerning fhe sfafemenf of fhe fheorem. Nofe fhaf T('0) is finife iff r/i 7 ^ 
123 ... n. When xp = 123 ... n, we lake T (xp) lo be 00 . 


Proof of TheoremU} The proof approach is based on lhal of Theorem 20 in llT4l . Lef p = f/^., so 
lhal p is in fhe inferior of fhe simplex Pick an arbifrary direcfion ^ G \ {0}, normalized 
so fhaf 11^112 = 1, such fhaf for all t wifhin a sufficienlly small interval around 0, fhe poinf p(t) = 
p +1 ^ confinues lo lie wifhin Nofe fhaf fhis implies lhal — 0- Consider fhe function 

g(t) = P{xp-,p(t)). We will show fhaf, independent of fhe choice of we have p'(0) = 0, 
g''{Q) < 0 if A: < T(r/>), and g” f)) > 0 if /c > 'T{xp). This clearly suffices fo prove fhe fheorem. 

Now, from ©, g{t) is expressible as Y.a9u{t), where g^f) = Differ¬ 

entiation, logelher wilh fhe facl lhal pj = | for all j, yields ^^(0) = YllLi Tiiaii)- Hence, 

- m 

5'(o) = 

(j i=l a 

For any fixed i G [m], fhe inner summalion 'ffrj Ca{i) can be evalualed as follows. As a ranges over 
all one-lo-one maps from [m] lo [k], for each j G [k], a{i) lakes fhe value j exaclly limes. 

Hence, ^( 1 ) = jpZ^ Ej=i Cj = 0 by choice of Thus, g'{0) = 0. 

Next, we compute Straightforward computations yield 

m \ 2 m 

j - X] ■ 

i=l ) i=\ 

Re-write fhe term wifhin square brackels as 

m 

laihi - 

i=l 

Summing over all one-to-one maps a : [m] [k], we obtain 

m 

^ ^ Ca[i) ‘ 

i=l cr o- 

As above, EaCii) = Ej=i which, since ||^||2 = 1, means fhaf EaCii) = 

Similarly, for i / i, Ea^^{i)^a{e) = E(s,t)e[fe] 2 :s^t 66- We also have 0 = (E 7 =i?j) > 
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from which we obtain “ E(s,t)e[fc]2:.^t66- Hence, again 

using the fact that ||^||2 = 1. Putting it all together, we find that 


5"(0) = C 




2 = 1 


where C = 
fact that Ei^i ^ yields 

/(O) = C 

from which the desired result follows. 


is a positive constant independent of Further simplification using the 




V 2=1 


□ 


Theorem [T] shows that the uniform distribution is either a local maximum or a local minimum 
of for all integers k > m, except perhaps at the threshold T('i/i). Indeed, if T('i/i) happens to 
be an integer, then for k = T('i/i), it is possible that Uk is not a local extremum, but only a saddle 
point. For example, for -0 = 1122, we have T(’0) = 3, and it may be verified that U 3 is not a local 
extremum for 

As a simple corollary of the theorem, we see that a necessary condition for the PML distribution 
of a pattern i/’ to be uniform is that T(-0) > m. However, this condition is not sufficient in general. 
In |[T3]| . we derive a slightly stronger necessary condition using Theorem [T] which is used as the 
basis for a heuristic algorithm that determines whether or not a given pattern has a uniform PML 
distribution. 

The intent of this paper, however, is to investigate whether the phase transition phenomenon 
reported in Theorem [T] extends to the Bethe approximation of the PML problem, which we describe 
in the next section. 


3. The Bethe Approximation 

The Bethe approximation is a method whose origins lie in statistical physics ifTSll . HSl. In the 
interest of brevity, we describe this approximation only in the context of the PML problem. The 
motivation and justification behind the definitions in this section are discussed in detail in ifTdil . 

3.1. The Bethe PML Problem. From ([31), we see that computing the pattern probability p) 
is equivalent to computing the permanent of the matrix 0(^; p). It is well known that computing 
the permanent of a matrix is hard in general; formally, the problem is #P-complete flTI . Many 
approximation algorithms have been developed for this problem (e.g., ITU, HU), of which the 
ones based on the Bethe approximation l20l . |2T]| . IT4l are relevant to us. 

Let Vk denote the set of A: x A; doubly stochastic matrices. In the following definition, we use 
the convention that 0 log 0 = 0. 

Definition 1 (HU, Corollary 15). The Bethe permanent of a non-negative k x k matrix 0 = {Oij), 
with Oi d — 0 for all i, j, is defined as 

perm^(0) := max exp {—Fb{T, 0)), 
rSiVk 

where for F = (jij) G we have Fb{T) = Ub{T, 0) — Hb{T), with 
UB{T,e) = - 

hi 

Hb{T) = - log(7hi) + ~ '^kj) - 7hi)- 








6 


CHUN LAM CHAN, WINSTON FERNANDES, NAVIN KASHYAP, AND MANJUNATH KRISHNAPUR 


The function 0) in the above definition is called the Bethe free energy. If the pair (T, 0) 

is such that 7 * j > 0 but Oij = 0 for some we define 0 ) = 00 , and correspondingly, 

exp(—FB(r, 0)) = 0. Wifh fhese definifions, exp(—FB(r, 0)) is a confinuous function of (T, 0), 
so fhaf for any fixed 0, exp(—FB(r, 0)) affains a maximum on fhe compacf sef V^. Hence, 
perm^( 0 ) is well-defined. 

For posifive mafrices 0, we can wrife 

perm^( 0 ) = exp ( — min Fb^T, 0 ) 

Vonfobel ifTdl Corollary 23] showed fhaf for any posifive mafrix 0, Fb^T, 0) is a convex function 
of r G Vk, so fhaf minreDj. ^^(r, 0) is a convex program. Vonfobel furfher proved fhaf fhe sum- 
producf algorifhm (belief propagation) can be used fo find fhis minimum, and hence perm^( 0 ), 
highly efficienfly. Since fhe Befhe permanenf is offen a very good proxy for fhe acfual permanenf 
ll2n . ifTOll . ifm . having an efficienf algorifhm fo compufe if is particularly useful. 

For a pattern of size m and a probabilify disfribufion p G k > m, we define, in analogy 
wifh dS]), fhe quanfify 

Pb(V»;p) := 777^-^ perms( 0 (r/i;p)). ( 7 ) 

[k — m)\ 

We then have 0 < 1 , fhe inequalifies holding for fhe following reasons: 

• fhe firsf inequalify is simply a consequence of fhe non-negafivify of fhe Befhe permanenf; 

• fhe second inequalify is because of fhe facf fhaf perm( 0 ) > perm 5 ( 0 ) for any non- 
negafive mafrix 0, an inequalify proved by Gurvifs Ii22l . |[23l : 

• fhe lasf inequalify is a consequence of fhe facf fhaf P(r/>; p) is a probabilify. 

Thus, Pb(V’; p) can be viewed as a probabilify as well. 

Wifh fhis, we define, in analogy wifh dUl and dUl, fhe Bethe PML probability of a pattern r/i fo be 

pBPML(^) := sup max Pb(V’;p). ( 8 ) 

k>m PCtlffc] 

A couple of clarificafions on fhis definifion may be needed. One is fhaf for any posifive infeger k, 
maxpgnjfe] Pb{'^]P) is well-defined. This is because x(r,p) := exp{—^^(r, 0(r/>; p))}, as a 
function of (F, p), is confinuous on fhe compacf sef x Hj;;.]. Consequenfly, perm^(0('0; p)) = 
maxreDfc x(r) p) is a confinuous function of p. Hence, F’b(' 0; p)^ being a confinuous function of 
p, musf attain a maximum on fhe compacf sef Hj^p 

A second clarification is fhaf if is nof known whefher fhe supremum in ([ 8 ]l is always achieved 
af some finife k, alfhough empirical evidence suggesfs fhaf fhis may indeed be fhe case ifTTI . Em¬ 
pirically again, fhe Bethe PML distribution, defined as any disfribufion p for which PBitp] p) = 
is a very good approximation of fhe PML disfribufion of a pattern The “Befhe PML 
problem” of defermining fhe Befhe PML disfribufion is also considerably easier fo solve numeri¬ 
cally m, am. 

The quesfion we are inferesfed in addressing is whefher fhe Befhe PML problem exhibifs a phase 
fransifion analogous fo fhaf described for fhe PML problem in Theorem [U To answer fhis, we 
musf undersfand when fhe uniform disfribufion Uk is a local maximum or a local minimum in Hj^] 
for fhe function p i—)• perm^(0(r/i; p)). A direcf approach analogous fo fhaf used in fhe proof 
of Theorem [T] seems difficulf as we only have a description of perm^ as a solution fo a convex 
opfimizafion problem. Insfead, we fake an indirecf approach via fhe degree-M liffed permanenfs 
discussed nexf. 

3.2. Degree-M Lifted Permanents. As an alternative to defining the Bethe permanent as a solu¬ 
tion to an optimization problem, Vonfobel gave a combinatorial characterization of this quantity, 
which we describe here. Let 0 = {Oij) be a given k x k matrix with non-negative entries. Lor a 
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positive integer M, let Vm denote the set of all M x M permutation matrices. Further, let 
be the set of all kM x kM matrices of the form 


A = 


with for all i,j. For 

/ 

0©A = 

V 


/ p(l,l) 
p(2,l) 

p(l-2) . . . 

p(2,2) . . . 

p{l,k) \ 
p{2,k) 


^ p{k,l) 

p{k,2) . . . 

p(k,k) ^ 


a A as above, define 




022 ^(2,2) 

... 

... 



••• Qk,kP^^'^'^ 


\ 


Definition 2 ( llT4l . Definition 38). The degree-M lifted permanent o/0 is defined to be 


perms, m{Q) ■= (perm(0 0 A))^/^, 


(9) 


( 10 ) 


where the angular brackets represent the arithmetic average o/perm(0 0 A) A ranges over the 
matrices in Equivalently, ^perm(0 0 A)) is the expected value o/perm(0 0 A), 

the expectation being taken over A chosen uniformly at random from 

Note that when M = 1, perm^ ^^^(0) is equal to perm(0). At the other extreme, as M —oo, 
Vontobel llT4l Theorem 39] has shown the following identity: 


limsupperm^ = P 61 '™b(®)' (H) 

M^OO 


Thus, degree-M lifted permanents interpolate between perm(0) and perm^(0). The advantage 
of using degree-M lifted permanents as an indirect means of understanding the Bethe permanent is 
that they can be expressed in a form that is more amenable to analysis. 


Proposition 2. For any k x k matrix 0 = (Oij) and any positive integer M, we have 


[perms, 




E n 



{cLi,j )! 


where Ak,M denotes the set of all kxk non-negative integer matrices whose row and column sums 
are all equal to M. 


Using multinomial coefficients, the identity above can be expressed in an alternative, more evoca¬ 
tive forn0: 


[perms^^(0)] = ^ 

{a,i,j)eAk,M 


n "A 

.ikmk? 


rrk ( M ) n'" ( “ ) 


M 


n 


{*j)e[fe]' 


(M) 


(12) 


Proposition |2]is proved in Appendix A. 


^It is also possible to recover this form from Lemma 29 in 1241 . 
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4. Phase Transition in the Bethe PML Problem 

As mentioned at the end of Section lrTl we take an indirect approach, via degree-M lifted perma¬ 
nents, to the question of the existence of a phase transition phenomenon in the Bethe PML problem. 
This approach is based on the intuition that the large-M behaviour of these lifted permanents will, 
by virtue of (ITTI) . shed light on the behaviour of the Bethe permanent. With this program in mind, we 
define for a pattern of size m > 2, and integers k > m and M > 1, a function M+ 

that maps p G IIj^] to perm^ p)). Recall that Uk denotes the uniform distribution on [/c]. 

The aim of this section is to prove the following result. 


Theorem 3. Let ip be a pattern of length n having canonical form m > 2. There 

is a threshold such that for all integers k > m, the following holds for all sufficiently large 

M: 

• when k < T Uk U a local maximum for and 

• when k > T b{'^)< is a local minimum for jd'f 
When m = 2, the threshold T Bi'tp) may be chosen as follows: 


Tb(^) 


CX) if Pi = P2 = l 

<2 + 5 if Pi = p2 > I 
1 + (5 otherwise 


for any 5 G (0,1). 

When m > 3, Tb{'^) may be chosen to closely mimic the threshold 'T{ip) of Theorem\I}in the 
following sense: 

• ifT{ip) = oo (which happens iff ip = 123 .. .n), then Tb(i/’) = oo; 

• ifT{ip) < then we may choose Tb{iP) = T(i/>); 

• in all other cases, we may choose 


Tb(V’) 


U + n? — 2n + [n? + 2n — UY — 

2{U - n) 


where U = Ti’ "^{'4’) ~ 1 < sift) < LI (ip) holds. 


The theorem does not explicitly give a comparison between the thresholds T{ip) and Lb{iP) in 
the case when m = 2. This has been done only so that a clean statement of the result could be 
given. Indeed, there is a close relationship between the two thresholds even in this case: it can be 
shown using Theorem [T] that when m = 2, 


LilP) 


oo if Pi = P 2 = 1- 

2 + ^ ifpi= P2>1 


and L{ip) lies in the interval (1, 3) otherwise. 

In summary. Theorem [3] strongly indicates that the Bethe PML problem exhibits a phase transi¬ 
tion phenomenon very similar to that proved in Theorem [U for the PML problem. Unfortunately, 
this does not quite prove that there is indeed such a phase transition in the Bethe PML problem. We 
make some remarks concerning this in Section 0 

The rest of this section is devoted to a proof of Theorem [3l The proof proceeds along the same 
lines as that of Theorem [T] except that the calculations are messier. To preserve the flow of this 
section, we have moved the proofs of some intermediate lemmas, which mainly involve tedious 
calculations, to the appendices. Also, Proposition [5]below, which is also an intermediate step in the 
proof of Theorem [3l but which could be considered an interesting result in its own right, is proved 
in Section [51 
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Let p = Uk, and pick an arbitrary direction ^ G \ {0}, normalized so that ||^||2 = 1, such 
that for all t within a sufficiently small interval around 0, the point p(f) = p + continues to lie 
within !![;;.]. Note that this implies that Ylj=i Cj — 0- 

Given a pattern -ip with multiplicities (/ri,..., fim), m > 2, and integers k > m and M > 1, 
define the function Gk ,M(f) = PkM (p(f))0 As in the proof of Theorem [TJ the idea is to show 
that G'j^ m(0 ) = 0, and that for a suitable choice of independent of the sign of G'^. 
depends, for all sufficiently large M, only on whether k<TBork>TB- We will in fact prove 
the statement about the second derivative in the following equivalent form: there exists a threshold 
T B independent of ^ such that 

lim G'l <0 for all A: < and lim G'l (0) >0 for all A: > (13) 

M—>-oo ’ M—>-oo ’ 


It is straightforward to show that m(*^) = 0; ^ proof of this will be given in Appendix B as 
part of the proof of Lemma |4] below. To express G'[.f^{0), we consider once agair0 the set, Ak,M, 
of k X k non-negative integer matrices all of whose row and column sums are equal to M. For 


A = (aij) G Ak,M, define 


and let 


w{A) 


n 


(M - g,,,-)! 

) ■ 


Zk,M ■= ^ w{A). 


(14) 

(15) 


Then, Qk,M{A) := — w{A) defines a probability distribution on Ak,M- We will study this 
probability distribution in more detail in the next section. For now, we use it to give an expression 


Lemma 4. Wh have G'^ m(0) ~ ^ 


k^ Varfc,M(ai,i) 


{k-lf 


M 


i=l 


2 2 . 

— n \ — n 


where 'Vaik,M{ 0 - 1 , 1 ) denotes the variance of the entry ai_i in a random matrix A G Ak,M chosen 
according to the distribution Qk,M- 


The proof of the lemma is deferred to Appendix B. The quantities Zk,M and Var^ in 

the above expression for G'^ M(ii) determined explicitly for k = 2, and asymptotically as 

M —>• 00 for A; > 3. 


Proposition 5. (a) Z 2 ,m = M + 1 and Var 2 ,M(ai,i) = ^M{M + 2). 

(b) For k > 3, we have 


lim [(M!) 

M^oo 


2k-k^ 


(Zk, 


M 


I M 


(A._ i)fc(fc-i) 
yt{k-2) 


and 




{k-lf 

k^{k-2)' 


The proof of the proposition will be given in the next section. As a direct consequence of 
Lemma |4] and Proposition |5l we have the following result. 

^Here, and for the remainder of this section, we will suppress the dependence on 1 /) in our notation; thus, we write 
Gk,M{t) instead of G'pT b instead of T^ft/j), and so on. 

^See Proposition|2 
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Corollary 5.1, (a) When k = m = 2, 


lim 

M^OO 


^2,^(0) 


-n 2 ^ ifm = 
+00 7^ /i2- 


(b) When k > 3 (and k > m), 


lim 

M^OO 




(k - 

kHk-2) ^ 


k-1 
k(k - 2) 





( 16 ) 


Proof. Only part (a) requires a note of explanation. When k = m = 2, the term k Pi ~ 
in the expression for G'fc^(O) in Lemma |4] reduces to 2(;U^ + ^ 2 ) ~ {hi + which equals 

{Pl-P2f- □ 

For our purposes, it is only the sign of limM^cxD G’^ jy^(O) that matters, so we will make much 
use of the weaker corollary below. 

Corollary 5.2. (a) When k = m = 2, we have Wthm^ooG'^ ^ if hi = h 2 , and 

limM^-oo G"2 m(0) > 0 ifm / fi2- 
(b) When k > 3, we have limM^oo G'^ ^(0) ^0 if 

k‘^{U — n) — k{U + n^ — 2n) + ^ 0 


Proof It suffices to point out that the condition in part (b) above is equivalent to the term within 
square brackets in (ITbl) being negative or positive. □ 

Thus, when k > 3, the sign of limM^oo G'l ^(0) depends only on where k lies in relation to the 
roots of the quadratic polynomial x‘^{U — n) — x{U + n? — 2n) + nf. Note that U = YfJiLi Pi — 
YffiLi Pi = a., with equality iff //j = 1 for all i G [m], i.e., tp = 123... n. The lemma below 
summarizes the behaviour of the roots of the quadratic equation. 

Lemma 6. Write q{x) = x‘^{U — n) — xifA + — 2n) + v?, which has discriminant D = 

fn? + 2n — lA'f' — 4n^. Recall that T = 

(1) If hi = n (which happens iffip = 123 .. .n), then q(x) = —x(n^ — n) + has as its 
only root. Since < 2, we have q{k) < D for all k >3. 

(2) IfU > n, then we have exactly one of the following two cases: 

(a) The discriminant D is strictly negative, which happens iffT < that q{x) has 

no roots. In this case, q(k) > 0/or all k. 

(b) The quadratic has two real roots pi < p 2 given by 

U + 11 ? — 2n — s/d , U + r? — 2n + s/D 

Pt = -^^2 =-^777-^-• 

2 {U — n) 2 (U — n) 

In this case, we have 1 < pi < 2, so that q(k) < Ofor 3 < k < p 2 , and q{k) > Ofor 
all k > p 2 . Furthermore, T — 1 < p 2 <^1 holds. 

The proof of the lemma is given in Appendix C. We now have the tools required to complete the 
proof of Theorem [ 3 ] 

Proof of Theorem\^ Recall that the goal is to show that there is a threshold T b independent of the 
direction vector ^ such that ([T3]) holds. Corollary 15 .1 1 shows that limAT^oo G'l ^^^(0) is independent 
of With this, the m > 3 case of Theorem[3]follows directly from Corollary 15.21 b 1 and Lemma^ 
The m = 2 case requires a few additional details to be checked. When pi = P 2 = 1, Corol¬ 
lary [5i2] and Lemma|6tl) show that limjvf^oo m(0) ^ ^ k > 2, and hence, we can take 

Tb = 00. 
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When fii = H 2 > 1, we need to argue that ([T3] ) holds for any G (2,3). Corollary I5.2r a) 
shows that liniM^oo G'^. ]^{0) <0 for k = 2. Therefore, it remains to show that limM^-oo m (0) 
0 for all k > 3. Similarly, when ^ // 2 , we need to show that limM^-oo G'l ^^^(0) > 0 for all 
k > 2, so that we can take G (1,2) in this case. Again, Corollary I5.2r al takes care of the 
k = 2 case, so we are left with A: > 3. In summary, we must show that if {^ 1 ,^ 2 ) / (1,1), then 
limM^oo G'l. jy^(O) >0 for all k > 3. For this, we will appeal to Lemma02). 

If the discriminant D is negatiye, then we are done by Lemma [6t2)(a). So, we may assume 
that > 0, in which case the situation of Lemma [6t2)(b) applies. It suffices to show that when 
{fj,i,lj, 2 ) = (a, 6) / (1,1)> then p 2 < 2. Note that D = (2(a + b) + 2ab)‘^ — 4(a + b)^ = 
4[(a + 6 + a6)^ — (a + 5)^]. Using the fact that (a + b)^ > 4o6(a + 6), we obtain D < 4[(a + 6 + 
a6)^ — 4a6(a + b)] = 4(a + b — a6)^. Thus, y/D < 2|a + 6 — ab\, and hence, 

n^-U + ^/D 


P2 = l + 


2{U -n) 


< 1 + 


ab + \a + b — ab\ 
a‘^ + b"^ — {a + b)' 


(17) 


Now, for positiye integers a, b, we haye ab < a + b only if 6 = 1. If (a, b) = (2,1), then obserye 
that D = —8 < 0, which cannot happen. So we can haye ab < a + b only if 6 = 1 and a > 3. In 
this case, (fTTI) becomes /92 < 1 + < 2, the last inequality holding for any a >l + V2. 

Finally, if aft > a + 6, then (fTTI) reduces to 


P2 < 1 + 


2 ab — (a + ft) 

_|_ ft2 _ (qj _|_ ft) ’ 


which is at most 2. We haye thus shown that /O 2 < 2 wheneyer {pi, P 2 ) / (1,1), which completes 
the proof of the m = 2 case of the theorem. □ 


5. The Probability Distribution Qk,M 

The main aim of this section is to proye Proposition [5] For conyenience, we recall the releyant 
definitions first. We use to denote the set o^k x k non-negatiye integer matrices whose row 
and column sums are all equal to M. The probability distribution Qk^M on Ak^M is defined by 
seffing, for A G Ak,M, Qk,M{A) = y^w{A), where w{A) and Zk^u are giyen by (fT4b and ([TSl) . 
We are especially inferesfed in determining Var^ (ai^i), fhe yariance of fhe enfry aip in a random 
mafrix A = (ajj) G Ak^M chosen according fo fhe disfribufion Qk,M- 

To compute fhe yariance of ai 1 , we need fo know fhe mean E[ai_i], where E[-] denotes expecfa- 
fion wifh respecf fo fhe probabilify disfribufion Qk,M- As fhe following lemma shows, fhis is quife 
sfraighlforward. 

Lemma 7. For any entry aij of a random matrix A G Ak,M chosen according to Qk,M> we have 

E[aiy] = -^M. 


Proof. Note fhaf w{A), and consequenfly, fhe probabilify function Qk,M{A), is inyarianf fo row and 
column permufafions of A. Therefore, fhe expected yalue E[aj j] musf be a consfanf independenf of 
(i,y). This consfanf can be explicifly eyaluafed as follows: 




£=1 
k 


'-£=1 




the last equality using the fact that 


□ 


The inyariance of Qk,M{A) to row and column permutations of A also implies that the yariance 
of aij is independent of (f, j). In spite of this, the explicit computation of Var^ 7 v^(ai^i) seems 
difficult in general, with the notable exception of the case when k = 2 . 
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Observe that A 2 ,m consists precisely of the matrices 

a M — a 
M — a a 

with a G {0,1,, M}. For any such matrix A, we have w{A) = 1, and hence, Z 2 ^m = 1-42,m| = 
M + 1. In particular, Q 2 ,m is just the uniform distribution on A 2 ,m- It follows that Var 2 ,M(ai,i) 
is the variance of a random variable uniformly distributed on {0,1,... , M}. An easy calculation 
shows this to be j^M(M + 2), thus completing the proof of part (a) of Proposition [5l 

Henceforth, we consider the case when k > 3. Our interest is in the regime where k is fixed and 
M goes to oo. 


5.1. Estimating Zk,M- To estimate the normalization constant Z^^m, we make use of the fact that 

wI m < Zk^M < 

where w\ = m.a.yiA^Ak m w{A). As we will see below, it is possible to explicitly determine Al¬ 
in the regime of interest to us, grows super-exponentially in M. Since |-4.fc,M| < [M + 1)^^, 
a quantity that is polynomial in M, we see that the asymptotics of Z^^m are governed by the 
asymptotics of w\ 

Write M = qk + r, where q = [M/k\ and r = M — kq. Note that we have 0 < r < k. Let 
u = (ui,..., Uk) G be defined by Uj = q + 1 for 1 < f < r, and u* = q for r + 1 < i < /c. 
Clearly, Ui = M. Lef U be the circulant matrix having u as its first row. The fact that 17 is a 
circulant matrix assures us that it is in Ak,M- 


Proposition 8. The matrix U maximizes w{A) among all A G Ak,M, <^nd hence, 


wl,M = W{U) 


'{M-iq + l))\' 

kr 

r(M-q)!l 

{q + l)l 


[ q'- \ 


Before giving a proof of the proposition, we briefly discuss ifs implications. In the regime of 
fixed k and M -A oo, routine algebraic manipulations using Stirling’s approximation yield 

75L (18) 

Thus, w'^ ^ grows super-exponentially in M, so that as observed above, its asymptotic behaviour 
governs that of Zk,M- Consequently, in the left-hand side of (fT^ . we may replace w\ ^ with Zk,M, 
thereby obtaining the first equality stated in Proposition [5jb). 

Our proof of Proposition [8] is based upon the theory of majorization 1251 . For any vector x = 
(xi,..., Xk) G let = (x[i],..., xyk]) denote the permutation of the components of x such 
that X[i] > ... > X[fc]. A vector x G M*' is said to be majorized by a vector y G denoted by 
X A y, if for £ = 1,..., A: — 1, we have 

e i 

i=l i=l 

and Y\=i X[i\ = ELi y[i\ (or equivalently, Yi=i = Yi=i Vi)- 

The next lemma plays an important role in our proof of Proposition [8] Let us define Sk^M to be 

the set of vectors x = (xi,..., Xk) G such that E^=i — 47. Also, recall from above our 
definition of the vector u = {ui,... ,Uk) G Sk^M with Uj = q -|- 1 for 1 < f < r, and Ui = q for 
r + 1 < i < k, where q = [M / k\ and r = M — kq. 

Lemma 9. The vector u is majorized by every x G Sk^M- 
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Proof. Consider an arbitrary x G Sk,M- By definition, must 

show that X;Li ^[i] < ELi for £ = 1,..., /c - 1. Suppose that Yfi=i > ELi for 
some £ G { 1 ,... , A: — 1 }. Note that Ei=i '^[i] = + min(£, r). Thus, 

£ £ 

^ '^[i] =k + min(£, r), 
i=l i=\ 

from which we infer that < <1 + min(l, r/A). Since must be an integer, we deduce that 

X[£+i] < q- 

On the other hand, we also have 

k £ 

{k - ^)X[£+ 1 ] > ^ X[i] = M - ^ X[i] 

i=£+\ i=l 

£ 

i=\ 

= qk + r — (^iq + min(f', r)) 

> q{k — £) 

Hence, X[£+i] > q, which contradicts the inequality X[£+i] < q deduced previously. Therefore, our 
assumption that Yl!i=i '^[i] > Ei=i ^[i] cannot hold. □ 

A function f '■ 2+ —^ M is said to be Schur-concave if for all x, y G l!f, 

X ^ y ^ fiyf > 4>{y). 

Note that if x and y are permutations of each other, then x^ = yj^, so that both x A y and y A x 
hold. Hence, a Schur-concave function f must be symmetric, which means that (/)(x) = <^(y) 
whenever x and y are permutations of each other. 

A characterization of Schur-concave functions is given in ll25l Chapter 3, A.2.b], which we adapt 
to our context in the proposition below. 

Proposition 10. A function f ' —)• IK is Schur-concave iff it is symmetric, and for each choice of 

non-negative integers s, X 3 ,..., the function s—x, X 3 ,..., Xk) is monotonically decreasing 
in xfor x > s/ 2 . 

Now, define the function f : iJf —)• R as follows: 

^ (E£^jX£)'- 

4>{xi,... ,Xk) = Y\ -j-• (19) 

J =1 

An application of Proposition [TO] shows that this function f is Schur-concave. Indeed, </> is obvi¬ 
ously symmetric. We claim that, for any choice of non-negative integers s, X 3 ,..., Xk, the function 
ip{x) := f{x, s — x, X 3 ,..., Xk) is monotonically decreasing in x for integers x > s/2. To see this, 
first verify that 

^{x + 1) _ {x 1 -£ X 3 -£ ... -£ Xk){s - x) 
ip(x) {xl){s - x-\-X 3 Xk) 

Now, we must show that for x > s/2, this ratio is at most 1, or equivalently, (x -|- 1 -|- X 3 -|- ... -|- 
Xk){s — x) — (x -|- l)(s — X -|- X 3 Xk) < 0. Upon some re-arrangement and cancellation of 

like terms, the left-hand side becomes (X 3 Xk){s — x — {x -\- 1 )), which is at most 0 when 

X > s/ 2 . 
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Proof of Proposition^ For any A G Ak,M-> let ai,..., denote the rows of A. Note that w{A) = 
nf=i where f is as defined in (fT9l ). 

Since all row sums of A are equal to M, the row vectors ai,..., a^ are all in S^^m- By Lemma|9] 
and Schur-concavity of f, we have 

k k 

w{A) = ]j 9 ^(ai) < ]J(/>(u) = w{U), 

i=l i=\ 

which proves the proposition. □ 

We remark that with a little bit of extra work, it can be shown that when k divides M (and fc>3), 
the matrix U, which in this case has all entries equal to M/k, uniquely maximizes w{A) among 
A G Ak,M- When k does not divide M, any matrix obtained by permuting the rows and columns 
of U is also a maximizer. 


5.2. Estimating Var^ Recalling that the variance of Oij is independent of (i, j), we have 

Varfc,M(ai,i) = ^ ^ Varfc,M(aM) = - M/kf 






We will recast this in terms of the matrix U defined previously. Recall fhaf fhis mafrix has all ifs 
enfries rtj.j equal fo eifher \_M/k\ or [M/A:]. 

For any mafrix T = (fij), we use ||r|p fo denofe fhe sum ■ ff ■. Observe fhaf 


E[\\A-Uf] =E 


= E 


= E 


Uij) 
ij 

- M/k) - {uij - M/k))‘ 






- M/A:)2 + E - M/k) 




Hence, 


Varfc,M(ai,i)-;^lE[P-C/||2] 


A:2® 


- M/kf 




< 1 . 


( 20 ) 


Thus, for our purposes, if suffices fo esfimafe E[||j4 — U\\^]. 

Our sfrafegy fo esfimafe E[||A — [/||2] is to approximate the distribution Qk^M by a suitably 
chosen discrete Gaussian distribution. The implementation of this strategy rests upon the following 
lemma. 


Lemma 11. Fix k > 3. Let p = M/k, and for A G Ak,M, T = {Uj) = A — U. If p > ^ and 
maxjj |fjj| < then 


w{A) w{A) 

log —— - log 


w{U) ^ w{U) 

wftere i5(>l) := exp { - i . |5f . i j;, j (2, }, 




E(i*‘ 






Jh 


The proof of the lemma is given in Appendix D. 

Taking cue from Lemma [HI we define a discrefe Gaussian measure on fhe sef, Ak,M, of all 
k X k infeger (nof necessarily non-negafive) mafrices A = (aij ) wifh row and column sums equal 
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to M. For any such matrix A, define w(A) := exp < — 75-3 — r> where (tij) = A — U and 

[ ^^k,M O J 

^“k M (x) • define a prohahility measure Qk,M on Ak,M as follows: for A G Ak,M, 

Qk,Mi^) ■= -^^—w{A), 

Zk,M 

where = EagA^.m 

Let E[-] denote expectation with respect to the measure Qk,M- To he clear, when we write 
E[\\A-Uf], we mean EagA^.m Qk,M{A), while E[P-f7f] refers to EagAu^ 

Qk,M{A). The following lemma, proved in Appendix E, shows that E[||A — C/|p] is well- 
approximated hy E[||A — f7|p] as M —)• 00 . 

Lemma 12. E[||A - Uf] = E[||A - Uf] + o(M). 

The usefulness of this lemma stems from the fact that E[||A — U\\‘^] can he estimated very ac¬ 
curately. To do this, we express E[||A — U\\‘^] in a different form. For any A G Ak,M, note that 
A — [7 is an integer matrix all of whose row and column sums are equal to 0, i.e., A — U £ Ak,o- 
For simplicity, let Tk := Akfl- We then have E[||A - f7|p] = i EtgTu. II^IP exp{-^-^||r|p}, 

where Z := EtsT. 

For any T = (tj j) G Tk, each of the entries in the kth row and column of T is linearly dependent 
on the entries Uj, 1 < i,j < A: — 1. To he precise, 

k-l 




i = 1,... ,k — 1] 


j = l 



k-l 


tkj = 

~ ^ ’ 

j = l,...,k-l 


i=l 



k-l 

k—1 k—1 


- Y^ 

= EE*m- 


i=l 

i=i j=i 


It follows that ||T|p is a positive definite quadratic form in the {k — 1)^ variables fjj, 1 < < 

k — 1. Hence, we can writ^^ 

||rf = t’Bt, 

where t G is a vector with coordinates tij, 1 < i,j < k — 1 (in some fixed linear order), 

and H is a symmetric positive definite matrix. With this, we have 


E[||A - Uf] = ^ t’Bt exp I - 


tGZ(^-T 


^^k,M 


t'Bt 


where Z can now he written as 


Z = Y exp 


‘^^k,M 



^To avoid notational confusion, we write t', x' etc. instead of t^, etc. to denote the transpose of t, x etc. 
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Thus, we see that E[||^ — C/|p] is equal to the expected value of t'Bt, where t is a random vector 
in distributed according to the discrete Gaussian measure i exp < — t'Bt >. The 

^ [ ^^k,M j 

lemma below is a direct consequence of Proposition [18] in Appendix F. 

Lemma 13. 'We have 

i[P - Uf] = {k- ifal^ + 0(exp(-CfcM)), 
where is a positive constant depending only on k. 

Equation (l20l) and Lemmas \Y2\ and [T^ suffice to prove that 

(k — 1)^ 

Varfc,M(ai,i) = - ctI m + o{M), 

from which we obtain the second statement of Proposition |5lb). 


6. Concluding Remarks 

The two main results of this paper, namely. Theorems [T] and (3] encapsulate an abrupt transition in 
behaviour of the uniform distribution in relation to the mapping Pi'f : Etj^] —)• M_|_ defined by 
p I—>• perm^ p)). While the former theorem concerns the M = 1 case, the latter theorem 

holds for all sufficiently large M. Our proofs of these theorems involve picking an arbitrary unit 
vector ^ G such that p(t) = Uk + lies within 11 for all t within a sufficiently small 
interval around 0, and analyzing the first and second derivatives at f = 0 of the function Gk,M{t) = 

/3^m(pW)- 

Unfortunately, we have not been able to successfully extend this proof technique to deduce an 
analogous result for the mapping [0,1] defined by p i—>• perm^(0('0; p)). The main 

fechnical obsfacle here is fhe facf fhaf fhe corresponding function Gk,oo{t) = /3^oo(p(0) 
be differenfiable. (We do know fhaf fhis funcfion is confinuous, since perm^('i/i; p) was nofed fo be 
a confinuous funcfion of p in the paragraph following ([8]l.) Note that VontobeTs identity (ITTi) only 
allows us to say that limsup^^oo Gk,M{t) = Gfc,oo(f) pointwise in t. By itself, this is insufficient 
to claim the differentiability of Gk,oo- 

It would be possible to prove the desired result if we could show that the functions Gk,M satisfy 
the following two conditions within a sufficiently small interval I about 0: (i) limM^cxD Gk,M{t) 
exists pointwise on /, and (ii) the first three derivatives of Gk,M are uniformly bounded on I. 
Indeed, using the properties of equicontinuous and uniformly convergent functions (for example. 
Theorems 7.17 and 7.25 in ll26ll l. we can then deduce that Gk,oo is twice-differentiable in the interior 
of I. Moreover, we would have limM^oo Gk,M{t) = Gk,oo{t), limM^oo G'k^ivii'^) = 
limA^f^oo G'l. j^{t) = G'l ^{t) for all t in the interior of I. In particular, these hold at f = 0. From 

this, we could conclude, via Lemma |4| and Corollary I5.21 that the mapping 13'koo admits a phase 
transition at the same threshold as in Theorem (3] 

We would unhesitatingly conjecture that condition (i) above is true, but we are less sure about 
condition (ii). It is actually not difficult to show that Gj, ^ is uniformly bounded within some small 
interval I, but the uniform boundedness of the second and third derivatives presents difficulties. 
It is in fact entirely possible that this approach will not work, as it may be the case that Gk^oo is 
not differentiable. However, we do believe that the ultimate result is still true: there is sufficient 
numerical evidence in favour of there being a phase transition at the threshold Tsi'tp), defined in 
Theorem[3l for fhe uniform disfribufion in the Bethe PML problem. 
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Appendix A: Proof of Proposition |2] 


From Definition 121 we see that perm^ m(®) involves the permanents of kM x kM matrices 
0 O A as in (ITOl) . Each such permanent involves a sum over permutations vr : [kM] —)• [kM]. We 
identify a permutation tt : [kM] —>• [kM] with the permutation matrix in VkM whose {s, t)th entry 
is a 1 iff 7r(s) = t; in a slight abuse of notation, we let vr denote this permutation matrix as well. 
We will find it convenient to view any matrix vr G VkM ns a block matrix of the form 


/ ^(1,1) ^(1,2) ... ^(l,k) \ 

^(2,1) ^(2,2) . . . ^(2,fe) 

y 7r(^d) 7]-(fc,2) . . . ^{k,k) ^ 


( 21 ) 


where for 1 < < k, the (i,j)th block is the M x M submatrix of vr located at the 

intersection of the rows and columns indexed by (t — 1)M +1,..., iM and (j — 1)M +1,..., jM, 
respectively. Let ai j{7r) denote the number of Is in 

Given two 0/1-matrices P = (pij) and Q = (qij) of the same size, we write P < Q if for 
all i,j, we have pij < qij (or equivalently, pij = 1 => qij = 1). Using the newly introduced 
notation, we observe that 


perm(0OA)= ^ (22) 


since permutations vr ^ A contribute only Os to the permanent. Hence, 

(perm(0©A)) = (M!)-"^ E E 11 

= (A/!)-"^ E E n (23) 

TT&VkM AeP^^'':7r<A {i,j)e[kP 

Now, for a given tt G VkM , the number of matrices A as in (l9ll such that vr < A is equal to 
n(i j) 6 [fc] 2 (Af — ajj(7r))!. This is because, for each {i,j), determines the positions of aij{Tr) 
Is in and the positions of the remaining Is in phd) can be chosen in (M — ajj (7r))! ways 

to make a permutation matrix. Therefore, carrying on from (l23l) . we have 

(perm(0OA)) = (M!)-"^' ^ J] {M - aij{7r))l (24) 


For any vr G VkM, the k x k matrix A(7r) := (ajj(7r)) is a non-negative integer matrix whose 
row and column sums are all equal to M. Recall from the statement of Proposition |2] that Ak,M 
denotes the set of all such k x k matrices. Thus, we can write (l24b as 

(perm(0 0A)) = (M!)-'^^ ^ \VkM{A)\ (M - a^,,)! (25) 

A={aij)eAk,M (*.i)e[A:]2 

where. VkM{A) := {tt G VkM ■ A{tt) = A], Therefore, the proof of Proposition |2] would be 
complete once we prove the following lemma. 


Lemma 14. For A = (ajj) G Ak,M, we have 

{M\f^ 


\VkM{A)\ 
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Proof. Given a matrix A = (oij ) G we can construct permutation matrices tt G Vum such 

that A{'k) = Ahy following the three steps described below. Our description views a kM x kM 
matrix tt as ak x k block matrix as in (|2TI) . 

(1) Fix an z G [k]. For each j G [k], pick Oij rows of within which to place Is. Since 

TT cannot have two Is in the same row, the number of ways in which these aij rows, j = 
1,... ,k> can be picked is the multinomial coefficient ( ^ ). Then, letting i range over 

[k], we see that the number of ways in which rows of vr can be so chosen is Hi (a i t,) ■ 

(2) Fix a j G [k]. For each z G [k], pick Oij columns of within which to place Is. By a 

similar argument as above, this can be done in ^ ^ ) ways. So, letting j range over [k ], 
we see that the number of ways in which columns of vr can be so chosen is Ffo f ^ ) ■ 

(3) Fix a pair (z,j) G [k] x [k], and consider the submatrix of formed by the points 
of intersection of the Oij rows and columns chosen in the first two steps. For vr to be a 
permutation matrix, this submatrix should be a permutation matrix as well. Hence, there 
are {aij)\ ways of placing Os and Is within this submatrix. All other entries of must 
be Os. Letting (z, j) range over [A:] x [fc], we determine the number of possible choices in 
this step to be 

Thus, putting together the counts in the three steps, we obtain 


\'PkM{A)\ 



which simplifies fo fhe expression in fhe sfafemenf of fhe lemma. 


^ i,3 


□ 


Appendix B: Proof of Lemma H] 

We infroduce some convenienf nofafion fo be used in fhe proof. For A = G Ak,M, define 
7A(f) = and 7(f) = Y.A&Ak,M w{A)jA{t). We will need fhe values of 7(0), 

7'(0) and 7"(0), for which we need fo compufe 7a(0), 7a(0) and 7^(0). 

Determining 7 a( 0 ) is easy: since Pi = ^ for all z G [A:], we have 

7 ^( 0 ) = ^ i^-Mn^ 

K 

fhe lasf equably using Oij = M and pj = n. Hence, 

7(0) = ^ w{A)ja{ 0 ) = k-^^Zk,M- ( 26 ) 

A 

Similarly, faking fhe derivalive of yA{t), if is slraighfforward fo show fhal 

7a ( 0 ) = 

i,j 

and hence, 

7 ( 0 ) = '^w{A)y'AiO) = 

A i,j A 

Now, aijw{A) = Zfc where E[-] denotes expeclafion laken wifh respecf fo a random 

mafrix A G Ak,M disfribufed according fo Qk,M- By Lemma |7j ^[ 0 * 7 ] is a consfanl independenf 
of (z,y), and hence, 

7'(0) = k^-^^Zk^M(const.) ^ 

V j 2 V J / 


( 27 ) 
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since ^ = 0 by choice of the direction vector 

Calculation of the second derivative 7''(0) requires a lot more work. To start with, routine dif¬ 
ferentiation yields 


7:^(0) = k 


2-Mn 


/ _ ^ 

2 


1 “ kj(i'id 

_ ^ id 2 

id 


which we can plug into 


7"(o) = j;m(A)7^(o) = Zk^Mni’m] 


to get 


7"(0) = Zk,Mk 


2-Mn 


= Zk^uk 


‘2-Mn 


E 


E 










Mn 


(28) 


For the second equality above, we used Lemma |7] and Yli j = (Yli k-j) — since 

ll^lls = 1- 

To determine E], we write 


E 




y ^ ) — y y ~h y ^ 


(29) 






As argued for E[ajj], the expected value E[a|j] is also a constant independent of With 

this, the first term on the right-hand side (RHS) of (l29l) can be expressed as 

Y,^Ik]nalj]=nal,]Y,kl (30) 

id j 

using 11^112 = 1- 

Turning our attention to the second term on the RHS of (l29l) . we note that for {i, j) / {i',j'), by 
virtue of the invariance of Qk,M with respect to row and column permutations, 




IE[ai,ia 2 , 2 ] if i / f' and j / j' 
E[oi^iai^ 2 ] otherwise. 


(31) 


Now, 


1 ^, 1 
]E[Qi,iai,2] = — j-yy E[ai4aij] = — 


i=2 


k-1 


-E 


E[ai,i(M - ai,i)] = 


ui,i yy 

i=2 

m2 


k-1 


k-1 


k 

where we used Lemma|7]to get the last equality. By a similar argument. 


-EKi ] 


(32) 


^[ 01 , 102 , 2 ] = 


1 


k-1 

1 

k-1 


E 


01,1 2 = YlE[ai^i(M - 01,2)] 


1=2 


M2 


— E[aipOi,2] 


(A:-l)2[ k 


k-2 


m 2 + E[a?^i 


(33) 
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this time using (|32] ) to get the last equality. Plugging (I30l)-(l3^ into (l29l ). and then performing some 
careful book-keeping, we eventually obtain 


and hence, 


E 



(fc-iy 


■ VarA;,M(«i,i 


3 


7"(0) 




{k-iy 


■ Varfc^M(ai,i 


3 



(34) 


Finally, by Proposition |2l we have 

= perms,M(0(^;p(i))) = 

Taking the derivative with respect to t and setting f = 0, we obtain 

g;,m( 0) = [(M!)“-‘"|iT7(0)i-S'(0) = 0, 

since 7 '( 0 ) =0 — see (|27] ). 

Similarly, calculating the second derivative at f = 0, we get 

Plugging in the expressions for 7 ( 0 ) and 7 "( 0 ) given in (|2^ and (1^ . respectively, we obtain the 
expression for G'l, ]^{0) recorded in the statement of LemmalU 


Appendix C: Proof of Lemma[ 6 ] 


Part (1) of the lemma is obvious, so we concern ourselves with part (2). Here, there are two 
claims that need proof: 

Claim^l. The discriminant D is strictly negative iff T < 

Claim^2. When > 0, so that real roots pi and p 2 exist, we have 1 < pi < 2 and T — 1 < 
P 2 <T. 


Proof of Claim^l . Note that D = (n^ + 2n — UY — 4n^ < 0 iff \n? + 2n — U\ < 2n^/^. We may 

remove the absolute value in the latter condition since -n? > U. Thus, L) < 0 iff + 2n — ZY < 

2n^/^, which upon some re-arrangement becomes U — n > (n — Thus, recalling that 

2 

T = ^ we see that D < Ois equivalent to 


T < 


11 ? — n 

(n — y/nY 


Upon cancelling the common factor y/n{n — y/n), the right-hand side simplifies to 


y/n+l 
y/n—1 ■ 


□ 


Proof of Claim^l. We start by writing pi = 1 -|- show that pi > 1, it suffices to 

show that V ? — lA > Vd, or equivalently, (n^ — UY > D. Routine algebra shows that (n^ — UY — 
D = AnlA, which is of course positive. 

For the rest of this proof, it will be convenient to define a = U —n,h = U + v? — 2n and c = lY, 
so that q{x) = ax^ — bx c. With this, pi = 

We now give a proof for pi < 2. Suppose that pi > 2. We would then have 6 — 4a > 
YY — 4ac. This yields two inequalities that must necessarily be satisfied: 5 — 4a > 0 and (6 — 
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4a)^ > \/62 — 4ac. The latter inequality simplifies to 4a — 26 + c > 0, so that the two inequalities 
that must be satisfied are: 

6 — 4a > 0 and 4a — 26 + c > 0 (35) 

Plugging in fhe expressions for a, 6 and c, we obfain + 2n > 3U and 2U > n^. Combining fhese 
inequalities, we gel 

^n‘^<U<^{n^ + 2n), (36) 

and hence, < 5 ( 12 ^ + 2n). Upon re-arrangemenf, Ihis becomes — 4n < 0, which yields 
0 < n < 4. 

As we do nol consider pattern lenglhs n < 2, we musl deal wifh n = 2,3,4. Plugging fhese 
values of n info (l36l) . we obfain {n,U) = (2,2), (3,5) and (5,8) as fhe only valid solutions. 
The assumption of part (2) of fhe lemma is fhal U > n, so we cannol have {n,U) = (2, 2). Also, 
{n,U) = (3, 5) is nol possible as Ihis yields a negative discriminanl. We are Ihus forced lo conclude 
fhal if p 2 > 2, Ihen {n,U) = (4,8). Indeed, in Ihis case, we have q{x) = 4x^ — 16x + 16, so fhal 
Pi = P2 = 2. We have Ihus proved fhal pi < 2 always holds, and in facl, if holds wifh equably iff 
{n,U) = (4,8), i.e., = 1122. 


To prove fhal T — 1 < p 2 < consider fhe difference p 2 — T. If may be verified fhal Ihis 
difference can be expressed as 


P2-T = 


1 

2{U - n) 



U) + V(n2 - uy - 4.n{U - n) , 


which is obviously sfricfly negative, given fhal U > n and n? > U. Thus, p 2 < T. 

Now, suppose P 2 — < —1- Then, using fhe expression given above for P 2 — < —1, we 

musl have + 2n — ?IA. This yields Iwo inequalities fo be safisfied: 

+ 2n — 36^ > 0 and {m? — iXf' — 4:n{U — n) < {v? + 2n — ‘SU'f'. The latter inequalify can 
be manipulated info fhe following equivalenf form: 4(6/ — n)(26/ — n^) > 0. Since U > n, Ihis 
inequalify is safisfied iff 2U > v?. Thus, P 2 — < —1 only if fhe inequalifies in (l36l) hold. 

As argued earlier, fhese inequalifies are safisfied only if {n,U) = (4,8), in which case if may be 
verified fhal p 2 — T = —1. This proves fhal P 2 > TT — 1 always holds, and again, if holds wifh 
equably iff {n^U) = (4, 8), i.e., i/i = 1122. □ 


This completes fhe proof of Lemma |6] 


Appendix D: Proof of Lemma m 

Throughoul fhe proof, we fix /c > 3. We inlroduce some convenienf nolalion: p := M/k, 
£i,j ■= Uij—pa.ndf{x) := x(M+l—x). Now, consider fhe ratio u;(A)/r(;((/) = w{U+T)/w{U): 

{M - {uij + Uijl 


w{U + T) 
w{U) 


n 


{M-Uij)'. {uij+tij)\ 


1 


f{uij + f{uij) -p-,- fipY^- 


f (Ui,j T 1) ■ ■ ■ T ^i,j) 


n 


/(p)lti.ti 


n 




4“ 1) ■ ■ ■ f T ^i,j) 


The Iasi equably above holds because ^ tij = 0. Thus, 


log 


w{U + T) 
w{U) 


Yl Y 

(*j):ii,j<0r=-|ti,j|+l 


fjUjj +i) 

fip) 


Y Y^^^ 

(i,j)-U,j>0 ^=1 


fjUjj +i) 

flp) 


(37) 
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f (n • - —j— 

Thus, we need estimates for the summands log . Observe that, since Uij + tij = Uij > 

0, the integers £ that appear in (lT7]) all satisfy Uij + f' > 1. We will make use of this observation a 
little later. 

We first derive useful estimates for the ratios Note that/(x2) — /(xi) = (x2—xi)(M+ 

1 — (xi + X2)). Using this and the fact that M = kp, we can write, for ^ G Z: 
f{UiJ +i) ^ ^ f^ij +i)- f{p) 


Up) 


= 1 + 
= 1 + 


fip) 

{£ + £ij){{k — 2)p — i — Eij + 1) 


p{{k - l)/9+ 1) 


+ 


Where 1 + ^{ 1 ) = (l - (l + 


P 

-1 


k-2 


(fc-l)py 


k-1 
. Observe that 


(1 + 7(^)), 




{k — 2)p [k — 2)p' 

using £i^j G (—1,1). On the other hand, using (l + ^ > 1 — (F=T)p’ 


(38) 


(39) 


l + 7(^) > 1- 


£ + £ip — 1 


{k-2)p 


1 - 


{k - l)p 


_ Y £ + Ej J ^ P + £ + £ij — 1 


ik-2)p {k-l){k-2)p‘^ 




> 1 - 


ik-2)p {k-l){k-2)p‘^ 

£ + £ij 

{k - 2)p 


(40) 


since, as observed earlier, £+Uij > 1 for the integers £ that appear in (|37] ). It should also be pointed 


out that the first inequality above requires 1- r^, 


{k-2)p 


to be non-negative. If 1- {k- 2 )p 


< 0 , 


then 1 + p{£) is still lower bounded by (l40l) . since we now have 1 + 7(1') = (l- (k- 2 )p ) 


1 1 1 _ 1 

{k—l)p) — _ ( fc— 2) p 


Thus, from (l^ - (l40b . we obtain for any integer £ occurring in (1371) . 

f{ui,j + £) 


fip) 


where C(^) = fEr 




= l + C{£){l+j{£)), 


and |7(f')| < Observe that for \£\ > 1, |C(f’)(l + 1{£))\ < 

which is at most 5 for -y < (Indeed, it is easy to check that 2x(l + 3a:) < ^ 
for |x| < i^(v/l5 - 2) = 0.1560....) Also, verify that |C(0)(1 + 7(0))| < ^ (^1 + (k- 2 )p ) - 
- (1 + -), which is at most | for p > 4. Henceforth, we assume p > 4 and — < |. 


(41) 


m 

p 


1 + 


3KI 


From (1441 . via the inequality x — < log(l + x) < x, valid for |x| < i, we obtain 


- [C(^)(1 + ii £))]' < log - C(^)(1 + Ii£)) < 0, 


and hence. 


C(^)7(^)-[C(^)(l + 7(^))] <log 


fip) 

f i^i,j 4” ' 


fip) 


m < Ci£)7i£)- 
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It follows that 




<lC(^)7(^)l + [CW(l+7(^))]' 


(42) 


Now, \C{iMi)\ < ^ (l^l+yi+2) < ^ . Furthermore, |1 + j{i)\ < 

1 + |7(^)| < 1 + < 1 + ^ + I = y|, as we have assumed ^ and p > 4. From this, we 

get |(C(^)(1 + 7(^))| < Plugging these estimates into (|4^ . we obtain 


f{p) 




+ 1 


< 4 


1^1+ 1 


P 


(43) 


From (1431) . we can deduce estimates for sums of the form log 
(IJTI) . Indeed, for an integer t < 0, with \t\ < ^p, we have 


/(My+r) ygg 

fip) 


£=-14+1 


f{uij +£) 


which yields 
0 


fip) 

fiuij + tj 1 f k — 2\ P \ 


E «<> 

£=-|i|+l 


4 A , 

- 7 ^ ^ 

^ £=-|t|+l 


+ 1)' 


fip) 


+ 


2\k-l 


[|fp + |f|(2ejj - 1)] 


£=-|i|+l 

The above bound can be brought into the following looser but simpler form: 


<^[|f|(|f| + l)(2|f|+l)]. 


^ ^ fip) 2\k-l 

£=-14+1 •’ ^ 

Similarly, for an integer 0 < f < sP^ we can obtain 


W , M 

" p2 2p 


< 


(44) 


^log 


fiuij +1) l/fc-2\/f 


?=i 


^ 4(t + l)3 3f 

- p2 2p 


(45) 


fip) 2\k-l 

Lemma [TT] readily follows from dT/l) . (l44l) and (l45l) . 

Appendix E: Proof of Lemma[T2] 

As in Lemma [m and Appendix D, we set p = We will use the notation introduced after the 
statement of Lemma [T^ in Section l5^ In particular, the measure Qk,M defined on the set Ak,M is 
equivalent to a discrete Gaussian measure on . This measure assigns to each t G 

the mass i exp(— — t'Bt), where (^“1 m ~ -B is a symmetric positive definite matrix. 

Let S G (0, be fixed, and define 

A,m(5) = {Ag Ak,M ■ max|fjj| < p^+'^}, 




where T = (tij) = A — U. We assume p > 4 throughout. By Lemma [TTl it follows that for any 
A G Ak,Mi^), we have 


m(A) w{A) 
log —— - log 




w{U) ^ w{U) 

where Ck is a positive constant depending only on k. Thus, for A G Ak,Mi^)^ we have 

m(A)exp(-Cfcp-5+3^) < ^ < w{A) exp{ckp-'^+^^) 

w[U) 


( 46 ) 
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Since —^ + 3(5 < 0, this shows that, as M —)• oo (so that p oo), the ratio w{A)/w{U) is well- 
approximated by w{A) for all A G Ak,M{S)- From this, we will be able to deduce that, as M ^ oo, 
the contributions made to ]E[||A — (7|p] and E[||^ — [/|p] by matrices in A G Ak,M{^) are nearly 
the same. The next two lemmas show that the matrices outside Ak,M{5) make vanishingly small 
contributions to both the expected values. 


Lemma 15. Let Bk^M{5) = Ak^M \ Ak^ui^)- IFc have 

^ IIA - UfQk,M{A) < Kk exp > 

A&Bk,M{S) 

where Kk is a constant depending only on k. 

Proof. For any A G Ak^M^ if the entries of T = A — U are. bounded above in magnitude by ^2+*^, 
then A must have non-negative entries, and hence, A G Ak,M{S)- Therefore, for any A G Bk,M{S), 
we have ||^ - C/|p = Hence, 

^ \\A-UfQk,M{A)< \\A-UfQk,MiA) 

AdBkM^) A&Ak,M--\\A-UW^>p^+^^ 

and the lemma follows by applying Proposition [19] in Appendix F with R = and t = p □ 


Lemma 16. Let 13k,Mi^) = Ak,M \ Ak,Mi^)- There is a positive constant c'^ depending only on k 
such that for all A G Bk,Mi^)> bound 


w{A) 

MU) 


< exp(-Cfcp^^) 


(47) 


holds for all sufficiently large M. Consequently, there is a positive constant df depending only on 
k such that 

E 11^ - ^11' (48) 

AeBk,M{S) 

holds for all sufficiently large M. 


Proof Given the bound in (|47]) . the bound in (|4^ follows readily. Indeed, note that for any A G 
Ak,M^ we have ||A — C/|p < ||A|p < k‘^M‘^. Also, note that \Bk,M{^)\ < |•^A:,M| < {AI + 1)^^. 
Therefore, 

E 11^ - (^11' - ^'4T2|Hfc,M(<5)| exp(-c',p2^) 

AGBk,M(S) ^ ' 

< exp(-Cfcp2'^ + 0(logM)), 

from which (|4^ follows. 

The proof of (l47l) builds on (iTTl) . Recall that A — U = T, and note that f{x)/f{p) > 1 iff 
p<x<M + l — p. For all {i,j), define fjj = min{|fjj|, pz+^j. Then, for tij < 0, we have 


E 


f{uij + £) 


0 


rw Ap) 

Also, for any Lj > 0 such that Oij = Uij + Lj < M + 1 — p, we have 


f{uij +£) 


(49) 


Elog 


fjuij +£) 

flp) 


> 




f{Ui,j + 
fip) 


(50) 
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Now, consider A € Suppose first that aij < M + 1 — p for all Then, from (|J7] ). 

(|4^ and (l50l) . we have 


log 


w(A) 

w(U) 




ii,j)-U,j>0 i=l 


f{uij+£) 

fip) 


(51) 


Now arguing as in the proof of Lemma [TT] in Appendix D (in particular, using (1441) and (|45]) 1. we 
can bound the right-hand side above by 



(52) 


Now, since A E we have ^ so that the first term above is upper 

bounded by ■ Using iij < the remaining two terms are upper bounded by 

Cfcp“2+3'5 gQjjjg positive constant Ck that depends only on k. Hence, we have 


log 


w{U) - 2 


k-2 

k-1 


P^^ + CkP~^^^^- 


This proves (l47l) for A E with maxij aij < M + I — p. 

It remains to consider the case of A E Bk,M{S) with max* j aij > M + 1 — p. The problem 
here is that if Oj^ > M + 1 — p, then (l50l ) may not hold, so we are unable to use the same approach 
as above to get to (|47]) . However, what we do now is to show that for each such A, there exists an 
A E Bk,Adi^) with maxjj- aij < M + 1 — p such that w{A) < w{A). As argued above, WT\ holds 
for A; therefore, it holds for A as well. 

So, let us now prove the existence of an A as required. Let (i, j) be such that Oij- > M + 1 — p. 
Then, since A E Ak,M, the following must hold: (i) for all f' / f and j' / j, we have Oj/j < p — 1 
and ai,j' < p 1 ; and (ii) there exists some i' ^ i and j' ^ j such that aipji > p. Now, consider the 
matrix A^ which has the same entries as A, except for the following: af- = a^j —1, ■ = Oj'j+l, 

af-i = Ojj/ + 1 and ■, = jv — 1. Note that A=*= is also in Bk^Mi^)- 

Let ai,..., afc and a^,..., a^ denote the rows of A and A^, respectively. Clearly, a^ = af for 
alH ^ Moreover, it can be directly verified using the definition of the function (j) in ([T9l ) that 

for ^ £ {h With this, we have 


k k 

w{A) = (j){ae) < (j){af) = w{A^). 

i=i i=i 

Note that the procedure of obtaining A^ from A strictly reduces the (i, j)th entry of A, and does 
not create any new entries larger than M +1 — p. If A^ still contains an entry larger than M +1 — p, 
we apply the procedure to A^ to produce a matrix (A=*=)^, and so on. Carrying on in this manner, 
after finitely many steps, we will obtain the desired matrix A. □ 


We are now in a position to complete the proof of Lemma [12] First, we write 




w{A) 

w{U) 


E 

A£Ak,M (^) 


\\A-Uf 


w{A) 

MU) 


+ E 11^-^11' 


w{A) 

w{U)' 


where Bk^M{5) is as defined in Lemma [T6| It then follows from (|4^ and (|4^ that there exists a 
positive constant ci^k depending only on k such that, for all sufficiently large M, 

E E U-Ufw(^) 
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<exp(ci,fcp-H3-5) ^ \\A-Ufw{A) 

A.GAk,M 

On the other hand, via (l46l) and Lemma[T5l we also have for all sufficiently large M, 

E E 11-4-f/ir'-"' 




w{U) 


>exp(-Cfcp-H3^) ^ \\A-U\\^w{A) 

> exp(-C 2 ,A:/ 0 “ 5 + 3 < 5 ) ^ \\A - U\\^ w{A) 


A&Ak,_ 


where C 2 ,fc is a positive constant depending only on k. 
Similar arguments also yield the inequalities 

exp(-C2,tp“UM) {i(A}< Y "7^ 

valid for all sufficiently large M. 

Now, note that 


E,4,a,m 11-4 - Vf ^ 


E[l|4-I7|n = 




w{A) 
k,M w{U) 


and 


E[\\A-Ur] = 


EA,Ak^j\A-ur^iA) 


( 53 ) 


(54) 


<exp(ci,fcp 2+3^) ^ w{A) (55) 

A&Ak,M 


^AeAk,M 

Therefore, from (l5^ - (l55]) . we deduce that, with cs^k = ci,a: + C 2 ,a:. 

exp(-C3,fcp-H3<5)E[p-[/||2] < E[\\A-Uf] < exp(c3,fc p-^+36) e[|| A - [/f] 
for all sufficiently large M. It follows that 

E[\\A-Uf]-E[\\A-Uf] = E[\\A-Uf]\0{p-^^^^). 

Since — ?7|p] = 0{p) by Lemma[T30 we conclude that 

E[P - Uf] - i[P - Uf]\ = o{p"2+^^), 

which proves Lemma [l2l 


Appendix F: Some Properties of Discrete Gaussian Measures 

In this appendix, we consider a discrete Gaussian measure defined by /r(x) = ^r;(x), where 
r;(x) := exp{ —^x'Vx} for x G (3 > 0 and V a symmetric positive definite matrix, and 
Z = ^(^)- Let X be a random variable distributed according to the measure p. We collect 

here some results on the measure p that are used in this paper. These results are valid in the regime 
where V is fixed and /? —)• oo. 

The main tool used in the proofs in this appendix is the Poisson summation formula (see e.g., fTh 
Chapter VII, Corollary 2.6] or ll2^ Section 17]). This formula applies to functions / : ^ C, 

with Fourier transform / defined for all ^ G as /(^) = /(x)e*(^’’‘^dx, such that 

l/(x)l, |/(x)l < ||^||d+5 for all X G 

^LemmaflTlis proved independently of LemmafT^ 
















PHASE TRANSITIONS IN PML 


27 


for some constants C > 0 and <5 > 0. For such functions /, the Poisson summation formula states 
that 


By a basic fact about the Gaussian density, the function u(x) = exp{ — 2 ^x^Vx} has Fourier 


transform v{$) = 
and we have 


(27r)‘^/^/3F^ 


2 / 3 " 


■\/det V 


exp(—Hence, the Poisson summation formula applies, 


(27r)^/^/3^/^ 
Vdet V 




(57) 


z = v{x) = v{2-Ki) 

where we define Z* = exp(— 

Clearly, Z* > 1 since the ^ = 0 term in the sum evaluates to 1. In fact, as /3 ^ oo, Z* ^ 1. 
This is because is a positive definite quadratic form, so that lim^j^oo exp(— 

equals 0 if ^ / 0, and equals 1 if ^ = 0. Thus, Z —)• Xo estimate the rate of this 

convergence, we make use of some bounds on the quadratic form 

The matrix can be diagonalized as U'A~^U, where U is an orthogonal matrix and A = 
diag(Ai,..., Xd) is a diagonal matrix composed of the eigenvalues, Ai,..., Xd, of V. Since V is 
symmetric and positive definite, its eigenvalues are all real and positive. Thus, with rj = U^, v/e 
have = r}'A~^r] = Yli=i Hence, letting Amin and Amax denote the smallest and 

largest eigenvalues, respectively, of V, we have 

1 II ^iiO 1 II iiO . ^ . 1 II iiO 1 


A„ 


1^11 = 


A. 


1^1 


< < 


A. 


\vf = 


At- 


(58) 


the equalities on either side being due to the fact that orthogonal transformations preserve £2 norms. 


Proposition 17. In the regime where (5 
Proof. Let Cq = 


1 ry (27r)‘^/^/3^/^ 

00 , we have Z = , , — 

Vdet V 


i + o(»=‘p(-£«) 


—. It is enough to show that Z* = 1 + 0(exp(—G q/S)). Since Z* > 1, we 
need a corresponding upper bound. This is done using (15^ as follows: 


Z* < 


Y exp f-Co 

^ *=i 


d 

HE 

i=l ii& 


exp 






^exp^-Co/3f 


< 


^exp^-Co/3|^| 


The upper bound 1 + 0(exp(—C q/S)) now follows from the geometric series summation formula. 

□ 

The Poisson summation formula also applies to the function u(x) = x'Vxu(x). Recall that 
if / has a twice-differentiable Fourier transform, then the function g{if) = ||x|p/(x) has Fourier 
transform g{^) = — A/(^), where A/ is the Laplacian of /. With /(x) = exp(—|||x|p), we have 
p(x) = ||x|p exp(—^||x|p), and ^rt(x) = p(/3“^/^V^/^x), where is the symmetric positive 
definite square root of V. We know that /(^) = (27r)'^/^ exp(—^ ||4|p), from which straightforward 
computations yield g{^) = (27r)'^/^(d — ||^|p) exp(—^||4|p). Now, via a change of variable, 

= exp(-^/3^'V-iO, 


(59) 
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where we have used the identity in (|57] ). 


Proposition 18. In the regime where (3 ^ oo, we have 
E^[X'VX] = <^)=M + 0 

xeZ'^ 




Proof. Plugging (l59l ) into the Poisson summation formula (l56l) . we obtain 
E^[X'VX] = I E = I E 

xeZ'i 

= ;^ E exp(-^47r2^ 

= f^d- exp(-i47r2/3^'V-i^). (60) 

It only remains to show that exp(—i47r^/3^'V“^^) is 0(exp(—^^/?)). Using 

(15^ . the summand can be bounded as 

f exp /3||^||E 

V ^ / ^min \ ^max / 

We then have 

expEEvr^^^'V^^^ 


1 


'^mm • 1 * j V '^max 

^=1 

4ezrf ^ i=i 

rr2 


< 


d 

■^min 

d 

'^min 

d 

■^min 


6eZ V^max 

^ez ^ d 


d 

n 

i=2 


Eexp(^«?) 

^iSZ ^ d 


Eexp(^/3C2 


^ez ^ ^ 


'^min 

:0Ep(-^^) 

\ '^max 


5ez d 

d-l 


d-l 


d-l 


'mm '' 

^ofexp(-^^/3)^ l + oEp(-^^/?) 

^min \ '^max J \_ \ '^max 

/ 97r2 . \ 


( 61 ) 
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the equality in (|6T]) being a consequence of standard geometric series summation formulas. □ 

Our final result estimates the contribution to E^[X'1/X] = ^ tt(x) made by vectors x G 
with x'F'x > R for some (large) i? > 0. To this end, define 3(f?) := {x G : x'l/x > i?}. 

Proposition 19, For any R > 0 and 0 < t < I, we have 


Proof. For 0 < r < 1, we wrife 

I ^(x) = ^ Y. 


exp 


xe3(i?) 


xe3(it) 
< exp 




2/3 


x'Fx ) exp( ——x'Vx 


IS “p( 


ceZ'i 


= exp 


= exp 


2/3 

1 — r 


uir^x) 


^IE 

2 ^ ^ ^ u(27rr"^^) 


< T ( 2 + 1 ) A ^ c^exp(-^47rV ^4) 


< exp — 


2/3 

1 — r 

W 


R) Jr E dexp(-i47r2/)^'V-i^) 


(62) 

(63) 


= exp(-i^i2) T-(i+i)/3d. 

In (l62l) above, we used fhe Poisson summation formula (l56l) . and (l6^ follows from (|5^ . □ 
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